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Air conditioning system is a complex system and consumes the most energy 
in a building. Any fault in the system operation such as cooling tower fan 
faulty, compressor failure, and damper stuck, etc. could lead to energy 
wastage and reduction in the system’s coefficient of performance (COP). 
Due to the complexity of the air conditioning system, detecting those faults is 
hard as it requires exhaustive inspections. This paper consists of two parts; 1) 
to investigate the impact of different faults related to the air conditioning 
system on COP and ii) to analyse the performances of machine learning 
algorithms to classify those faults. Three supervised learning classifier 
models were developed, which were deep learning, support vector machine 
(SVM) and multi-layer perceptron (MLP). The performances of each 
classifier were investigated in terms of six different classes of faults. Results 
showed that different faults give different negative impacts on the COP. 
Also, the three supervised learning classifier models able to classify all faults 
for more than 94%, and MLP produced the highest accuracy and precision 
among all. 
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1. INTRODUCTION 


Buildings consumed about 40%-41% of the total energy consumption, which is more than the 
energy used in the transportation sector and the industrial sector [1]. Heating, ventilation, and air- 
conditioning (HVAC) system is one of the leading energy consumers in the building and use up to 50% of 
the total building energy consumption [1-2]. In Malaysia, office buildings use the most energy for air 
conditioning system compared to other buildings such as hotels or shopping complexes [3-4]. In practice, the 
coefficient of performance (COP) is used as a measurement index to measure the performance of the air 
conditioning system [4]. The COP is a ratio of the rate of net heat removal of the rate of total energy input. 
The higher COPs equate to the lower operating costs. Thus, it can relate that the higher COPs mean less 
energy the system used. Fault in the air conditioning system operation such as cooling tower fan faulty, 
compressor failure, damper stuck, etc. could lead to lower COP and hence wasting the energy usage. 
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Propose that energy can be saved about 20% to 30% when fixing the HVAC faults. Therefore, Fault 
detection and diagnostics (FDD) techniques can be used to observe the operation of the air-conditioning 
systems, and at the same time, can detect abnormalities or faults. Extensive researches of FDD have been 
done on HVAC for past years [5]. Three types of FDD techniques for building systems which are model- 
based, rule-based and data-driven techniques [6]. The model-based method using the first principal and 
simplified lumped-parameter model to mathematically modeled the HVAC system. This method requires 
detail physical knowledge of the system. The drawbacks are it is complex to design and the developed system 
and the fault modeling are limited to that specific system only [7-8]. 

Meanwhile, in the rule-based method, expert knowledge is used to develop rules describing the 
system behaviour. Hence this method can restrict the portrayal of the system performance to some certain 
faults only. Therefore, combine both rule-based and data-driven methods to detect and diagnose faults in 
HVAC system. Results show the proposed method has much more diagnostic ability to identify and diagnose 
faults [9-10]. Lastly, the most popular method among previous researches is the data-driven method [6]. This 
technique does not require any physical nor expert knowledge to model the system. It uses historical data to 
train models, thus reduced the modeling complexity. Successfully implement a data-driven method to detect 
and diagnose faults in the air handling unit (AHU) [11-13]. 

However, there are a lot more previous researches regarding FDD in HVAC especially in chiller and 
AHU system, but until now, no analysis considers faults across the entire system [14]. Successfully 
implement the data-driven method using PCA to detect and diagnose faults in the chiller system [15]. 
Compares model-based and non-model-based diagnostic algorithms for (AHU) using Bayesian network 
diagnostic model [16]. Since the recent data-driven method requires a longer time for the computational 
process, [13] combined PCA and SVM to reduce the learning time for HVAC FDD. The proposed method 
was successfully tested on the commercial AHU system. Introduce Wavelet-PCA method to detect faults in 
AHU by eliminating the effect of weather changing conditions [11]. 

Several data-driven methods such as machine learning, artificial neural network (ANN) and support 
vector machine (SVM) are widely used in FDD. Support Vector Machine (SVM) is very efficient and widely 
used as a classifier [17]. In WEKA, the SVM classifier is also known as Sequential Minimal Optimization 
(SMO), which is a fast and straightforward method to train SVM [18]. Meanwhile, the Multilayer Perceptron 
(MLP) is supervised learning classifiers feed-forward back-propagation ANN, and the most frequently used 
in pattern recognition [17]. 

Propose fault detection and classification using deep learning in Tennessee Eastman (TE) process. 
They introduce 20 faults in the TE process and compare the performance of 6 classifiers. Results show deep 
learning method outperforms the other five methods [19]. Proposed FDD to identify abnormalities from 
normal operation and isolate variables related to faults in a chiller using PCA. The proposed method 
successfully identifies four faults in the chiller [15]. Compare the performances of decision tree, MLP, Naive 
Bayes, SMO, and Instance-Based for K-Nearest Neighbour in detecting breast cancer. The results show SMO 
is the highest accuracy as a single classifier [17]. Successfully implement multi-layer perceptron (MLP) to 
detect high impedance faults in distribution networks [20]. Compare the performance of Naive Bayes, 
Random Forest, Logistic Regression, and MLPand KNN in predicting breast cancer using WEKA. The result 
shows that KNN is the most accurate classifier follow with MLP as the second most accurate classifier [21]. 
Compare the performance of a few algorithms to detect breast cancer, and the result shows SVM has the 
highest accuracy among all [18, 22]. 

This paper aims to investigate the impact of different faults on COP and to analyse the performances 
of machine learning algorithms to detect faults across the centralised chilled water air-conditioning system. 
The performances of three classifiers were investigated in terms of six classes of faults. The detail of the 
research methodology used in this paper is explained in Section 2. It includes data collection, data 
classification, and pre-processing data procedures. The results are discussed and analysed in Section 3. 
Finally, Section 4 concludes the overall findings of this paper. 


2. RESEARCH METHODOLOGY 

This section describes the structure of the system and the research methodology involved 
in this work. The process of data collection, data classification, and data pre-processing were explained in 
this section. 


2.1. Data collection 

A lab-scale of chilled water system is used in this paper as described in [23-25]. It is a prototype of a 
chiller system that consists of a cooling tower, chiller, AHU and two test rooms. The chiller system has a 
chilled water tank to supply chilled water to the cooling coil of AHU. The cooling tower is designed as a 
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counter flow type. The AHU system has individual damper for each supplied air ducting. The speed of the 
AHU fan can be varied to achieve a specific amount of supplied air flowrate. The test rooms were made of 
insulated board and polycarbonate and size 2.4mx1.2mx1.6m each. Five bulbs rated 100 watt each installed 
in each test room to mimic heat from equipment and occupants. The schematic diagram of this prototype is 
shown in Figure 1. There were four types of sensors used to collect data from the prototype. There were 
temperature sensors, air flowrate sensors, water flow rate sensors, and current sensor. A total of 14 sensors 
were installed in the system, and the distribution of them is shown in Figure 1. 
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Figure 1. Schematic diagram of sensors located in the prototype system 
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2.2. Data classification 

Five types of faults were generated in this paper. The collected data were divided into six types of 
condition class. Class | was categorized as the normal condition in which every element works perfectly. 
Meanwhile, Class 2 was cooling tower fan faulty, Class 3 compressor failure and Class 4 supplied air damper 
stuck. Last but not least, Class 5 was supplied chilled water clogging and, Class 6 supplied air ducting 
leakage. The detail of all classes and the location of the fault is tabulated in Table 1. The fault locations are 
various across the whole system. All faults tested were shortlisted from previous studies and surveys 
conducted among air conditioning system contractors in Johor Bahru. The faults consist of abrupt and soft 
faults. The abrupt fault is easy to identify, however the soft fault is challenging to detect unless the 
degradation of performance is noticeable in terms of thermal comfort, equipment failure or excessive power 
consumption. Among all faults listed in Table 1, three of them were soft faults or degraded types of faults, 
whereas two were abrupt faults 


Table 1. Details of each class 


Class Condition Location of fault 
1 Normal (no fault) -- 
2 Cooling Tower Fan Failure Cooling Tower 
3 Compressor Failure Chiller 
4 Damper Stuck AHU 
5 Supplied Chilled Water Clogging Chiller 
6 Air Ducting Leakage AHU 


2.3. Data pre-processing 

All 14 sensors output installed in the lab-scale system was used as the input to the machine learning 
model. Firstly, all data were normalised using the min-max feature scaling method. It is to avoid features 
with higher range values influence more the accuracy of the training result. Data normalisation will equalize 
the data range as well as the variability of the data. The normalised data were then segmented for mean 
values for every Imin interval. The total of 75180 data for each class was combined with a dimension of 14 
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attributes of mean features and 5370 instances. The data were randomly split using WEKA toolkit into 80% 
for training and 20% to test the model. Last but not least, a new dataset was used for validation purposes. 


2.4. Simulation setup 

The models of deep learning, support vector machine (SVM) and multi-layer perceptron (MLP) 
were built using WEKA toolkit [26]. As for deep learning model, the optimization algorithm used was the 
stochastic gradient descent (SGD) and the activation function for hidden layers was sigmoid and the output 
layer was softmax. Meanwhile, the kernel function set for SVM was polykernel, which is the best for 
HVAC’s FDD [13]. Lastly, for MLP activation function used was sigmoid, the weight was set to 0.3, and the 
training time was 500 epochs. The hidden node used in this paper was ten as formulated in (1). Table 2 shows 
the parameter setting for the simulation. 


no of attributes +no of classes (1) 


no of hidden node = 5 


Table 2. Parameter setting for simulation 
Models Parameter setting 
Deep learning Activation function for hidden layer: sigmoid 
Activation function for output layer: softmax 
Optimization: Stochastic Gradient Descent (SGD) 
SVM Kernel function: polykernel 
MLP Activation function: sigmoid 


3. RESULTS AND ANALYSIS 

The first part of this section represents the energy consumption and COP of the system follows by 
the performance analysis of three classifier models. A total of 3760 instances were used to train the 
classifiers, 1610 instances to evaluate the model and 1344 instances to validate it. The accuracy and precision 
of each model were explained and analysed in detail. Accuracy is the number of relevant instances that have 
been retrieved over the total amount of relevant instances. While precision is the number of related instances 
among the retrieved instances. 


3.1. Energy consumption and COP 

Previous researchers have identified that faults in the air condition system could lead to energy 
wastage. During experiments, the energy consumption of the prototype system was logged to measure the 
performance of the system. The performance of an air-conditioning system is measured by the Coefficient of 
Performance or COP [4]. The COP can be calculated as in (2). 


COP = Total heat load (2) 


Total electrical load 


Table 3 shows the energy consumption and the COP of the system recorded for an hour. The COP 
of the system reduced when faults were injected into the system. During the normal condition, the COP of the 
system was 3.38, but the performance degraded when the system had faults. Thus, it will lead to energy 
wastage in the long run if no proper action taken. Since the air-conditioning system is a complex system, 
therefore, it is crucial to have FDD system to monitor any abnormalities in operation. 


Table 3. Total energy consumption and COP of the prototype system 


Condition Total Energy Consumption (kWh) COP 

Normal (no fault) 0.81 3.38 
Cooling Tower Fan Faulty 0.80 1.87 
Damper Stuck 0.81 1.16 

Supplied Chilled Water Clogging 0.76 1.50 
Air Ducting Leakage 0.82 2.44 


3.2. Deep learning 
Figure 2 shows the overall performance of deep learning model. The training model was able to 
classify all classes with an accuracy of 94% and a precision of 94.1%. Meanwhile, the evaluation and the 
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validation of the model were successfully obtained more than 93% for both accuracy and precision. Table 4 
until Table 6 shows the confusion matrix for all training, testing and validating the model. The confusion 
matrix is widely used to represent the accuracy of the classifier [17]. It is used to indicate the correlation 
between results and expected classes. From Table 2 until Table 4, they show that Class 2 and 3 were among 
the lowest accuracy for this model. 


Table 4. Confusion matrix for the training dataset Table 5. Confusion matrix for the testing dataset 

Class 1 2 3 4 5 6 Class 1 2 3 4 5 6 
1 673 0 0 5 19 8 1 286 0 0 5 7 4 
2 10 607 44 7 32 5 2 4 262 14 6 16 0 
3 3 37 422 0 8 0 3 2 20 172 0 7 0) 
4 0 0 0 469 1 0 4 0 0 0 200 1 0 
5 4 0 0 0 697 4 5 4 0 0 0 296 2 
6 16 0 0 16 6 667 6 ed 0 0 8 5 282 


Table 6. Confusion matrix for validating dataset of 


: Overall performance of deep learning model 
the deep learning model . . ~ 


Class 1 2 3 4 5 6 O tias 
1 240 0 0 0 7 5 Ss 94 Y 
2 4 218 21 0 9 0 %, 93.5 y accuracy 
3 2 15 149 0 2 0 = 93 G @ precision 
4 0 0 0 165 3 0 3 y) 
5 1 0 0 0 251 0 Aes ZY 
6 5 0 0 2 5 240 92 Zi 


Training Testing Validating 


Figure 2. Percentage of accuracy and precision of 
training and testing data using deep learning method 


3.3. Support vector machine 

Figure 3 shows the result of SVM. The overall accuracy of the training model increased as compare 
to the deep learning model. It managed to achieve up to 97% accuracy with a precision of 97.1%. It shows 
that SVM has better performance in classifying faults of the system compare to deep learning. The detailed 
accuracy of SVM was tabulated in Tables 7-9. The accuracy of Class 2 and 3 also increased tremendously as 
compared to the deep learning model. The classifying performance for Class 2 and 3 increased by 5% - 6% 
than the previous model. 


Table 7. Confusion matrix for the training dataset Table 8. Confusion matrix for the testing dataset 
Class 1 2 s) 4 5 6 Class 1 2 3 4 5 6 
1 686 0 0 3 14 2 1 291 0 0 3 7 1 
2 10 642 21 0 32 0 2 5 274 7 2 14 0 
3 0 7 455 0 8 0 3 1 4 189 0 7 0 
4 0 0 0 470 0 0 4 0 0 0 200 0 1 
5 0 0 0 0 705 0 5 0 0 0 0 302 0 
6 2 0 0 5 8 690 6 1 0 0 3 6 292 
Table 9. Confusion matrix for validating 
dataset of the SVM method Overall performance of SVM model 
ee a ST 
S 
2 5 231 8 O 8 0 ay 87 
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Figure 3. Percentage of accuracy and precision of training 
and testing data using the SVM model 
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3.4. Multi-layer perceptron 

As shown in Figure 4, the accuracy and precision of the model were 99.4%, which are the highest 
among the three models. Furthermore, the accuracy result for each class was not much different as compared 
to the previous models. The accuracy of each class is shown in Tables 10-12. 


Table 10. Confusion matrix for 70% training data Table 11. Confusion matrix for 30% testing data 
Class 1 2. 3 4 > 6 Class 1 2 3 4 5 6 
1 695 1 2 0 6 1 1 297 0 1 0 4 0 
2 0 705 0 0 0 0 2 0 301 1 0 0 0 
3 0 2 468 0 0 0 > 1 3 197 0 0 0 
4 0 0 0 468 1 1 4 0 0 0 200 1 0 
5 0 0 1 0 704 0 5 0 0 4 0 298 0 
6 0 0 0 0 6 699 6 0 0 0 0 5 297 


Table 12. Confusion matrix for validation result of Overall performance of MLP model 


the MLP model 604 
Class 1 2 3 4 5 6 094 
I 246 2 2 0 2 0 © oo 
2 0 252 0 0 0 0 Sb . @accuracy 
3 0 0 168 0 0 0 3 va @ precision 
4 0 0 0 164 3 1 a BZ 
5 0 0 1 0 251 0 B 98.6 | y 
98.4 - LA 
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Figure 4. Percentage of accuracy and precision of 
training and testing data using the MLP model 


Figure 5 shows the overall performance for all the classifier models used in this paper. It is clearly 
shown that MLP has the best accuracy and precision to classify all six different classes introduced in this 


paper. 
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Figure 5. Performance comparison among deep learning, SVM and MLP 


4. CONCLUSION 

This paper showed that the system’s COP would degrade into different values when different faults 
occurred in the air-conditioning system. Also, the performances of machine learning algorithms to detect and 
classify different faults have been investigated. Three algorithms were employed on the lab-scale prototype 
air-conditioning system dataset. The simulation results show that the MLP has the best accuracy and 
precision up to 99.4% than SVM and deep learning. The second most accurate classifier was SVM with 
correctly classified the data up to 97%. 
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